-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try nuking ShardLayout::V0 #12313
base: master
Are you sure you want to change the base?
Try nuking ShardLayout::V0 #12313
Conversation
can you guys do something like |
It fails for me actually, that's not great. I typically run it on the whole workspace and just filter to the tests that I want. Also we use
|
If you feel like fixing it, go for it. It looks like it's only a matter of adding some dependencies to the cargo file. |
JFYI this PR is marked as draft, please make it as ready for review when it is. |
It seems like this is expected behavior. If it's not bothering anyone else, not sure if it needs fixing. And it could be easily mitigated by adding an |
02b02f4
to
7f64b44
Compare
core/primitives/src/shard_layout.rs
Outdated
} | ||
|
||
/// Construct a layout with given number of shards | ||
pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got a better idea for the fn name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe multi_shard
, just to mach the single_shard
one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought but it could also be used to create a single-shard layout, so I changed my mind. But if you like that name I'm also down with it. :)
nearcore/src/config.rs
Outdated
@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds( | |||
.map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed)) | |||
.collect::<Vec<_>>(); | |||
|
|||
let shard_layout = ShardLayout::v0(num_shards, 0); | |||
let shard_layout = ShardLayout::of_num_shards(num_shards, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would cause some sanity check to fail as you could see from the CI logs. It seems like some json parsing issue. Not sure whether if you'd like to keep it as it was or to update the config somewhere else to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try updating the config and if it doesn't work leave as is.
let error_message = format!("{}", error).to_lowercase(); | ||
tracing::info!(target: "test", "error message: {}", error_message); | ||
assert!(error_message.contains("shard")); | ||
let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a panic from get_shard_index()
after switching to V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's pretty bad. Feel free to either:
- Fix it (may be complicated / lots of code if you need to add error handling)
- Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.
- Make the default shard layout V1 (hopefully this works?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try 3 (should probably work from the look of the code) which seems like a nice middle ground before finishing transition to V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks nice, answered some questions
// FIXME eagr what should be the default? | ||
#[default(ShardLayout::v0(1, 0))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally it should use the single_shard
method that returns the most recent (today it's V2) shard layout.
nit: The convention here seems to be to use the default_
function to provide the default value.
core/primitives/src/shard_layout.rs
Outdated
} | ||
|
||
/// Construct a layout with given number of shards | ||
pub fn of_num_shards(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe multi_shard
, just to mach the single_shard
one?
let error_message = format!("{}", error).to_lowercase(); | ||
tracing::info!(target: "test", "error message: {}", error_message); | ||
assert!(error_message.contains("shard")); | ||
let _res = env.clients[0].process_chunk_state_witness(witness, witness_size, None, signer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's pretty bad. Feel free to either:
- Fix it (may be complicated / lots of code if you need to add error handling)
- Leave as is but put a TODO(wacban) in there instead of FIXME and I will have a look.
- Make the default shard layout V1 (hopefully this works?)
nearcore/src/config.rs
Outdated
@@ -1087,7 +1075,7 @@ pub fn create_localnet_configs_from_seeds( | |||
.map(|seed| InMemorySigner::from_seed("node".parse().unwrap(), KeyType::ED25519, seed)) | |||
.collect::<Vec<_>>(); | |||
|
|||
let shard_layout = ShardLayout::v0(num_shards, 0); | |||
let shard_layout = ShardLayout::of_num_shards(num_shards, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try updating the config and if it doesn't work leave as is.
tools/database/src/corrupt.rs
Outdated
@@ -21,7 +21,6 @@ impl CorruptStateSnapshotCommand { | |||
let mut store_update = store.store_update(); | |||
// TODO(resharding) automatically detect the shard version | |||
let shard_layout = match self.shard_layout_version { | |||
0 => ShardLayout::v0(1, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good,
I think serde doesn't like (de)serializing maps with non-string keys, like the ones in V2 and it breaks the tests. Feel free to fallback to V1 is it's too crazy to fix in this PR.
core/primitives/src/shard_layout.rs
Outdated
#[test] | ||
fn test_shard_layout_v0() { | ||
let num_shards = 4; | ||
let shard_layout = ShardLayout::v0(num_shards, 0); | ||
let mut shard_id_distribution: HashMap<ShardId, _> = | ||
shard_layout.shard_ids().map(|shard_id| (shard_id.into(), 0)).collect(); | ||
let mut rng = StdRng::from_seed([0; 32]); | ||
for _i in 0..1000 { | ||
let s: Vec<u8> = (&mut rng).sample_iter(&Alphanumeric).take(10).collect(); | ||
let s = String::from_utf8(s).unwrap(); | ||
let account_id = s.to_lowercase().parse().unwrap(); | ||
let shard_id = account_id_to_shard_id(&account_id, &shard_layout); | ||
assert!(shard_id < num_shards); | ||
*shard_id_distribution.get_mut(&shard_id).unwrap() += 1; | ||
} | ||
let expected_distribution: HashMap<ShardId, _> = [ | ||
(ShardId::new(0), 247), | ||
(ShardId::new(1), 268), | ||
(ShardId::new(2), 233), | ||
(ShardId::new(3), 252), | ||
] | ||
.into_iter() | ||
.collect(); | ||
assert_eq!(shard_id_distribution, expected_distribution); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this one, the V0 may still be used when replaying some very old blocks.
Then I guess it needs a custom de/serializer that converts the keys to strings and back. I'll give it a shot if it's not too complicated. |
JFYI I had a look at the test failure in CI. It seems like something somewhere has the shard layout version hard coded to 0 where in your PR you (correctly) use the provided version. It's a bit wild, I'll keep digging. |
nearcore/src/config.rs
Outdated
} else { | ||
ShardLayout::v0_single_shard() | ||
}; | ||
let shards = ShardLayout::multi_shard(num_shards, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the runtime-params-estimator
test you can to set the version here to 0. It's suboptimal and definitely buggy but I don't think it's worth properly debugging this rather old test framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my it breaks a bunch of other tests. I guess it will be easier to fix it here after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix the
runtime-params-estimator
test you can to set the version here to 0.
done
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12313 +/- ##
==========================================
- Coverage 71.19% 71.15% -0.05%
==========================================
Files 839 839
Lines 169743 169831 +88
Branches 169743 169831 +88
==========================================
- Hits 120851 120844 -7
- Misses 43633 43717 +84
- Partials 5259 5270 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
The failing test could be fixed by adding some feature flags. But that's from master, does it need to be fixed here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for this contribution! Just a few final nits. Most are optional, I only really care about restoring the assertion in the resharding test.
shards_split_map: None, | ||
shards_parent_map: None, | ||
version, | ||
}) | ||
} | ||
|
||
/// Return a V0 Shardlayout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mark it as deprecated? I don't know how to do this properly in rust, if it's not straight forward then a comment should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about marking ShardLayout::V0
as deprecated? This way any usage of V0
would raise a deprecation warning including calling v0()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about both? :)
// Shard layouts V0 and V1 are rejected. | ||
assert!(ReshardingEventType::from_shard_layout( | ||
&ShardLayout::v0_single_shard(), | ||
block, | ||
prev_block | ||
) | ||
.is_err()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this?
@@ -2294,7 +2294,7 @@ fn test_protocol_version_switch_with_shard_layout_change() { | |||
epoch_manager.get_epoch_info(&epochs[1]).unwrap().protocol_version(), | |||
new_protocol_version - 1 | |||
); | |||
assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::v0_single_shard(),); | |||
assert_eq!(epoch_manager.get_shard_layout(&epochs[1]).unwrap(), ShardLayout::single_shard(),); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: remove the trailing comma
let id_to_index_map = | ||
layout.id_to_index_map.iter().map(|(k, v)| (k.to_string(), *v)).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely fine but I would be tempted to write a generic function that converts a Map<ShardId, T> to Map<String, T> and use it for all the maps in the shard layout. We're looking at whooping potential savings of ~2 lines of code so up to you if you think it's worth it :)
let id_to_index_map = layout | ||
.id_to_index_map | ||
.into_iter() | ||
.map(|(k, v)| Ok((k.parse::<u64>()?.into(), v))) | ||
.collect::<Result<_, Self::Error>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about generic function for this but here it may actually make some sense because it's less trivial logic.
impl TryFrom<SerdeShardLayoutV2> for ShardLayoutV2 { | ||
type Error = Box<dyn std::error::Error + Send + Sync>; | ||
|
||
fn try_from(layout: SerdeShardLayoutV2) -> Result<Self, Self::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mini nit: May unpack the layout as first step and then use the unpacked values directly? It may be a bit prettier and it would be more obvious that there isn't unnecessary cloning.
} | ||
} | ||
|
||
impl<'de> serde::Deserialize<'de> for ShardLayoutV2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the convention is to call the lifespan 'a
.
shards_split_map: None, | ||
shards_parent_map: None, | ||
version, | ||
}) | ||
} | ||
|
||
/// Return a V0 Shardlayout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about marking ShardLayout::V0
as deprecated? This way any usage of V0
would raise a deprecation warning including calling v0()
.
} | ||
|
||
/// Can be used to construct a multi-shard layout, mostly for test purposes | ||
pub fn multi_shard(num_shards: NumShards, version: ShardVersion) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about n_shard()
in the sense of creating an N-shard layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan tbh. How about just new
or new_test
?
I tried the pre-merge tests and unfortunately some are failing. Those are the most expensive tests that only run before merging to master. I tried a simple debug but I couldn't fix it easily. I'm afraid we may need to restore kv_runtime to use v0 for now to make it pass. I'm testing this change again on a fork from your PR: |
No description provided.